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Abstract. According to a previous result by S. V. Avgustinovich and 
the author, each factorial language admits a unique canonical decomposition 
to a catenation of factorial languages. In this paper, we analyze the 
appearance of the canonical decomposition of a catenation of two factorial 
languages whose canonical decompositions are given. 

1. Introduction 

This paper continues a research of decompositions of factorial languages started 
in [D E] and inspired by the field of language equations and algebraic operations on 
languages in general (see, e. g., [8] and references therein). As the development 
of the theory shows, even language expressions where the only used operation is 
catenation prove very difficult to work with. It seems that nothing resembling the 
Makanin's algorithm for word equations (see, e. g., [4]) can appear for language 
equations with catenation. Even easiest questions tend to have very complicated 
answers. In particular, the maximal solution X of the commutation equation 

LX = XL 

may be arbitrarily complicated: as it was shown by Kunc [6], even if the language 
L is finite, the maximal language X commuting with it may be not recursively 
enumerable. This situation contrasts with that for words, since xy = yx for some 
words x and y implies that x — z n and y — z m for some word z and n, m > 0. 

In some sense, the problems of catenation of languages are due to the fact that 
a unique factorization theorem is not valid for it: as it was shown by Salomaa 
and Yu [9], even a finite unary language can admit several essentially different 
decompositions to a catenation of smaller languages, and an infinite language may 
have no decomposition to prime languages and all; here a language L is called prime 
if L = L1L2 implies that L\ = {A}, where A is the empty word, and L2 = L, or 
vice versa. 

To avoid ambiguity of this kind, we restrict ourselves to factorial languages. This 
family is large and widely investigated since it includes, e. g., languages of factors 
of finite or infinite words and languages avoiding patterns (in the sense of [3]). We 
can also consider the factorial closure of an arbitrary language. Furthermore, the 
class of factorial languages is closed under taking catenation, unit, and intersection, 
and constitutes a monoid with respect to the catenation. 

Decompositions of factorial languages to a catenation of factorial languages also 
may be several: for example, a*b* — (a* + b*)b* — a* (a* + b*) (here and below 
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(+) denotes unit). However, as it was proved in [T], we can define the notion of the 
canonical decomposition of a factorial language which always exists and is unique. 

In this paper, we continue investigation of canonical decompositions of factorial 
languages and solve the following general problem: Given canonical decompositions 
of languages A and B, what is the canonical decomposition of their catenation AB? 

Besides the self-dependent interest, the answer to this question may help to 
solve equations on factorial languages. Indeed, equal languages have equal canonical 
decompositions, and these canonical decompositions may be compared as words. 
So, techniques valid for words can be applied for them. 

Thus, this paper may be considered as a description a tool helpful for solving 
equations on factorial languages. 

2. Definitions and previous results 

Let £ be a finite alphabet, and L C E* be a language on it. A word u 6 S* is 
called a factor of a word v € S* if v — sut for some (possibly empty) words s and 
t. The set of all factors of words of a language L is denoted by Fac(L). Clearly, 
Fac(Fac(L)) =Fac(L), so that Fac(L) may be called the factorial closure of L. 

A language L is called factorial if L =Fac(L). In particular, each factorial 
language contains the empty word denoted by A. In what follows, we consider only 
factorial languages. 

The catenation of languages is an associative operation defined by 

XY = {xy\x G X,y £ Y}. 

Clearly, languages constitute a monoid with respect to the catenation, and its unit is 
the language {A}, where A is the empty word. It is also clear that factorial languages 
form a submonoid of that monoid, since the catenation of two factorial languages 
is factorial. 

A factorial language L is called indecomposable if L = XY implies L = X or 
L = Y for all factorial languages X and Y. 

Lemma 1. [lj For each subalphabet ACE, the language A* is indecomposable. 

Other examples of indecomposable languages discussed in ] \ \ include languages 
of factors of recurrent infinite words, etc. 

A decomposition L = L\ ■ ■ ■ L n to factorial languages Li, . . . , L n is called minimal 

if 

• L = {A} implies n = 1 and L\ — {A}; 

• If L ^ {A}, then for i = 1, . . . ,n we have Li ^ {A} and L ^ L\ ■ ■ ■ Li^iL^L^i ■ 
for any factorial language L\ C Li. 

A minimal decomposition to indecomposable factorial language is called canonical. 

Theorem 1. [l] A canonical decomposition of each factorial language L exists and 
is unique. 

In what follows, we shall denote the canonical decomposition of L by L. Note 
that a canonical decomposition can be considered as a word on the alphabet T of 
all indecomposable factorial languages. In what follows, (=) will denote equality of 
elements of T*\ this notation will be used to compare canonical decompositions. 

All examples of factorial languages we shall consider in this paper will be regular, 
just because regular languages are easy to deal with. Note that the factorial closure 
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of a regular language is always regular (which is a classical exercise). We have 
proved also 

Theorem 2. |2j If L is a regular factorial language, then all entries of L are also 
regular. 

3. Preliminary results 

Suppose that we are given two factorial languages, A and B, on an alphabet 
£, and know their canonical decompositions A and B. Our goal is to describe the 
canonical decomposition AB, and the main result of the paper, Theorem [3j will 
give such a description. To state Theorem [3j we need to define two subalphabets of 
£, namely, II and A. 

For a factorial language L, let us define 

n(L) = {a e T,\La C L}, 

and 

A(L) = {a£ T,\aL C L}. 

Thus, if we take any word u £ L, we can extend it to the left by any word from 
A*(L) and to the right by any word from II* (L) to get a word from L. In other 
words, L = A*(L)LU*(L), and II(L) and A(£) are defined as maximal languages 
with this property. 

For the main result of this paper, we shall need to know the relationship between 
11(A) (further denoted by II) and A(B) (further denoted by A). The following 
lemmas explain the meaning of these subalphabets. Note that analogues of Lemmas 
EHU were proved in [I], but the lemmas are reproved here both for the sake of 
completeness and of more precise wording. 

Lemma 2. IfL = L X ---L k , then IL{L) = IL(L k ) and A(L) = A(Li). 

PROOF. Let us prove the statement for U(L); the statement for A(L) is symmetric 
to it. 

First, a £ Tl(Lk) implies that L^a C Lk and thus La = L\ ■ ■ ■ L^a C L\ ■ ■ ■ Lf. = 
L; so, n(L fe ) C n(i). 

On the other hand, a S n(L) means that L\ ■ ■ ■ Lk~iva C L for all v e Lk- Since 
Lk is a factor of the canonical decomposition of L, it cannot be contracted to a 
smaller factorial language L' k such that L\ ■ ■ ■ Lk-\L' k — L. It means that for each 
v G Lk\{\}, there exists some word wtv £ L such that w £ L\ ■ ■ ■ Lk—i, tv G Lk, 
and w is the longest prefix of wtv belonging to L\ ■ ■ ■ Lk-i. Since tv is not the 
empty word, w is also the longest prefix from L\ ■ ■ ■ L^-i of the word wtva G L. 
We see that tva G Lk and thus va £ Lk since Lk is factorial. Moreover, by the 
same reason a £ Lk, which means that Xa £ Lk and thus Lka C Lk- So, a £ U(L) 
implies a £ II(Lfc), which was to be proved. □ 

Given a factorial language A and a subalphabet A C E, let us define the factorial 
language L&{A) =F&c{A\AA). So, L&(A) is the subset of A containing exactly 
words starting with letters from S\A and their factors. Symmetrically, we define 
the subset Ra(A) of A containing exactly words which end with letters from E\A 
and their factors: Ra(A) —Fac(A\AA). 
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Lemma 3. Let X and B be factorial languages on S. If there exists a factorial 
language A such that X = AB, then there exists a unique minimal one, and it is 
equal to A' — Ra(b)(A). 

Proof. First of all, let us prove that A'B = X. The C inclusion is obvious: 
A' C A and thus A'B C AB = X. To prove the D inclusion, consider a word 
x E X, and let b be its longest suffix from B: since X = AB, we have x = ab for 
some word a E A. Suppose that a ends with a symbol S E A(_B); then 5b E B 
by the definition of A(B), and 6 is not the longest suffix of X belonging to B. A 
contradiction. Thus, x = ab E (A\AA(B))B C Ra(b)(A)B = A'B, and since x 
was an arbitrary element of X, the 3 inclusion (and thus the equality X = A'B) 
is proved. 

It remains to prove that A' C Y for every factorial language Y such that YB = 
X. Let us consider an arbitrary non-empty word a' E A' . Since A' — Ra(b)(A), the 
word a' is a factor of some word sa't E A\AA(B). Let the last letter of the word 
sa't be equal to a; then a E E\A, and a't = a" a E A. So, a'tB C AB = X = YB. 

For each b E B, let us denote by y(b) the longest prefix of a'tb = a" ab belonging 
to Y . Let the word b' be defined by the equality a'tb = y(b)b'; then b' E B since 
a'tb E YB. 

Clearly, if y(b) is not shorter than a' for some b E B, then its prefix a' belongs 
to Y (since Y is factorial), and this is what we need. But if y(b) is shorter than a' 
for all b E B, then each word b' contains ab as a suffix. So, ab E B for all b G 5 
(since B is factorial), and a € A(B) by the definition of A(£?). A contradiction. So, 
a' E Y for all a' E A' , and ^4' is indeed the minimal language such that A' B = X. 
□ 

Symmetrically, we can prove 
Lemma [Hf Let X and A be factorial languages on E. // there exists a factorial 
language B such that X = AB, then there exists a unique minimal one, and it is 
equal to B' = Ln(A)(B). 

The following lemma is one of the main steps of the proof. 

Lemma 4. For each factorial languages A and B, we have 

AB = R A (b)(A) ■ Lu(r A{B) (a))(B) = RA(L niA) (B))(A) ■ L n ( A) (B). 

Proof. We shall prove the first equality; the second one can be proved symmetrically. 

Let us denote Ra(B)(A) = A' and in(_R A(B) (A))(S) = B" . Due to Lemma 
A'B = AB, and due to Lemma H, A'B" = A'B. So, AB = A'B". Now note that 
all entries of the canonical decomposition of a language are indecomposable. So, 
to prove the required equality of canonical decompositions AB = A' ■ B" , we must 
prove only that no entry of the canonical decompositions A' or B" can be decreased 
to get the same product. 

Indeed, suppose we substituted an indecomposable entry of (^4') by its proper 
factorial subset. Instead of A', we obtained its proper factorial subset A\. Then 
A\B C AB since A' is the minimal factorial language such that A'B = AB. But 
B" C B; so, A X B" C A 1 B C AB, and A X B" ± AB. 

Now suppose we substituted an indecomposable entry of B" by its proper factorial 
subset, and obtained a proper factorial subset B\ of B" . Then A'Bi ^ A'B" = AB 
since B" is the minimal factorial set giving AB when catenated with A'. 

So, no entry of A' or B" can be replaced by its proper subset without changing 
the result AB. The equality is proved. □ 
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Lemma 5. Let X and Y be factorial languages on £, and A C £ be a subalphabet 
such that Y g A* . Then R A (XY) =XR A (Y). 

PROOF. Consider a word u G XRa(Y). If u G X, let us choose a symbol y £ Y 
from S\A. Then uy G XY\XYA C # A (^Y), and thus u G Aa^). If u i X, 
then u = xu', where x is the longest prefix of u belonging to X and u' G -Ra(Y) is a 
non-empty word. Let u" be a word from Y\YA such that u' is its factor: u" = su't 
for some words s and t such that the last letter of t is from £\A. Then u't G Y\YA, 
and hence itt = xu't G XY\JfYA C i? A (XY). It follows that u G R A (XY), and 
the 2 inclusion is proved. 

To prove the C inclusion, consider a word u G J?a(-^Y). Let it' = sut be a 
word from XY\XY A whose factor is u, so that its last letter is from S\A, Then 
ut G XY\XYA. Let = xy, where x G X and y £ Y; then y G Y\YA and 
w£ G X(Y\YA). So, either u G X, or u = xy' for some prefix y' of y: since 
y' G i? A (Y), in both cases we have u G XR A (Y), and the inclusion is proved. □ 

Symmetrically, we prove 
Lemma |5l Let X and Y be factorial languages on S, and II C X be a subalphabet 
such that X <£ IP. Then L n (XY) = L n (X)Y. 

The following series of lemmas is also one of important parts of the main result. 

Lemma 6. Let X be a factorial language, II C S be a subalphabet, and A(X)\LT ^ 
0. Then L n (X) = X. 

Proof. Let a G E be a symbol from A(X)\LT; then each word u from X can be 
extended to au G X by the definition of A(X). So, u GFac(cra) cFac(X\ILX") = 
Lu(X). Since u was chosen arbitrarily, and Ljj(X) C X, we get the equality: 
Ln(X)=X. □ 

The symmetric lemma is 
Lemma[6f Let X be a factorial language, A C E fee a subalphabet, and II(X)\A ^ 
0. Tften Ea(^) = X. 

Lemma 7. For eac/i factorial language X with X = X\ ■ ■ ■ X k we have 

X 2 ---X k , ifX 1 = A*(X), 
otherwise. 



L A (X){X) = | ^? 

Symmetrically, 



^•■■AVi, ? /x fe = n*(x), 



PROOF. We shall prove the first equality; the second one is symmetric. Let us 
denote A(X) = A. 

Suppose first that X\ ^ A*, that is, X x D A*. Due to Lemma [5f, La(X) = 
La(Xi)X 2 ■ ■ ■ X k . By the definitions, X\ = A*La(X\). But the language X\ is 
indecomposable and is not equal to A*, so, X\ = La(Xi), and the equality X — 
La(X) (and thus La{X) = X) is proved. 

Now suppose that X\ = A*. Then La(X) = La(X 2 ■ ■ ■ Xk) by the definition of 
the operator La, since all elements of X\X 2 ■ ■ ■ Xk cannot occur in La(X) anyway. 
Then, La(^) = X 2 because otherwise we would have X\X 2 = A*X 2 = A*Y for 
some Y = La(X 2 ) C X 2 , contradicting to the minimality of the decomposition X. 
So, due to Lemma H, L A {X) = L A (X 2 ■■■Xk) = L A {X 2 )X 3 ■ ■ ■ X k = X 2 ■ ■ ■ X k . 
The latter decomposition is minimal and thus canonical. □ 
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4. Main result 



Theorem 3. Let A and B be factorial languages with A = A\ ■ ■ ■ Ak and B = 
B\ ■ ■ ■ B m . Let us denote IT — Tl{A) and A = A(B) . Then the canonical decomposition 
of the catenation AB can be found as follows: 

(1) // A\n ^ and IT\A ^ 0, then AB = A-B. 

(2) 7/A = TT, andA k ^ A* , B 1 £A* , then AB = AB. 

(3) 7/ A = IT and A k = A*, then AB = A x ■ ■ ■ A k ^{E. Symmetrically, if A = IV 
and Si = A*, then ~AB = ~AB 2 ■ ■ ■ B rn . 

(4) 7/ IT C A, then AB = R A (A) ■ B. Symmetrically, if A C n, then AB = 
A-L^B). 

Proof. Cases (1) and (4) are obtained directly by applying Lemmas [6] and EI 
to the equality from Lemma HI Case (2) is as well obtained by applying to Lemma 
H] Lemma [71 

At last, in Case (3), if A k — A*, we apply Lemmas [7| and [2] to get La (A) = 
A\---Ak-\ and U(L A (A)) = n(A fc _ 1 ). Assume that n(^4 fe _ 1 ) includes A as a 
subset. Then A k _ x — A k _iA* , and A — Ai ■ ■ ■ A* = Ai ■ ■ ■ Af.-i, contradicting 
to the fact that A = A\ ■ ■ ■ A k _iA* . So, A\n( J 4 / t_ 1 ) ^ 0, and we apply Lemma[6]to 
get Ln( J 4 fc _ 1 )(B) = B. It remains to use Lemma|4]to get Case (3) of the Theorem. 
□ 

Corollary 1. The canonical decomposition of AB either begins with A, or ends 
with B, so that only one of the languages A and B can give canonical factors of 
AB different from the canonical factors of the language itself. 

Example 1. If A = {a, b}* and B = {a,c}*, then 11(A) = {a,b}, A(B) = {a,c}, 
and the canonical decomposition of AB is just {a, b}* ■ {a, c}* (Case (1)). 

Example 2. If A =Fac{a,a6}* and B =Fac{a,ac}*, then IL(A) = A(B) = {a}, 
and the canonical decomposition of AB is just Fac{a, a6}*Fac{a, ac}* (Case (2)). 

Here A is the language of all words on {a, 6} which do not contain two successive 
6s, and B is the language of all words on {a, c} which do not contain two successive 
cs. 

Example 3. If A = a* and B =Fac{a,a6}*, then LT = A = {a}, and AB = B 
(Case (3)). 

Example 4. Note that when A = LT and A k = B\ = A*, Case (3) may be 
applied in any of the two directions. For example, if A = a*b* and B — b*a*, then 
AB = a* -b* - a*, and it does not matter which of the occurrences of b* was removed. 

Before giving examples for Case (4), we will specify the form of the canonical 
decomposition of A' = Ra{A). Recall that A is a factorial language with the 
canonical decomposition A = A\ ■ ■ ■ A k> and A is a subalphabet of E. 

Let us define languages A\, i = k, . . . , 1, as obtained by the following iterative 
procedure: starting from Afe := A, we put for each i from k to 1 



Lemma 8. The canonical decomposition of A' = Ra(A) can be obtained by deleting 
extra {A} entries from the decomposition A' = A[ ■ A' 2 ■ ■ ■ A' k . 



A', = R Ai (Ai) and A<_i 
A'i = {A} and A*_i = A 
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PROOF. First of all, note that due to LemmaE] applied iteratively, A' = A± ■ ■ ■ Ak-i 
A\ ■ ■ ■ Ak-2A' k _ 1 A' k = ... = A[ ■ ■ -A' k , Some of the languages A\ can be equal to 
{A}; in particular, if A C A*, then A' = {A}, as well as all its factors. However, 
if A' ^ {A}, then we can canonically decompose factors A\ not equal to {A} and 
erase the others. 

Clearly, if we substitute any of canonical factors of A[ by its proper subset, we 
get a new language A" C A\. So, to prove the lemma, we should just show that 
A! + A[ ■ ■ ■ A\_ x A!!A! i+x ■ ■■ A' k for any A'! C A' % . 

For all i = 1, . . . , k, let us define Di — A[ ■ ■ ■ A\ and Ei-\ = A\ ■ ■ ■ Ai-\. We also 
define D a = {A}. Note that by the definition and LemmalU for all i>l,Di is the 
minimal language such that DiA' i+l ■ ■ ■ A' k = A' . So, it remains to prove only that 
A'i = A", where A" is the minimal language such that Di^iA" = Di. By Lemma 
H, we have A'( = L n ^ Di) (A^). 

First, suppose that Z?i_i ^ £»-].. We knew that Di = Di-\A\ = Ei^iA^, and 
Di-i is the minimal language giving Di when catenated with A\. So, by Corollary 
[U in the canonical decomposition of Di the factors corresponding to A\ do not 
change, and A\ = A", which was to be proved. 

Now suppose that A-i = Then II(A-i) = n^-i) = From 

now on, we denote this subalphabet just by II'. We knew that Ai was equal to 
Lw(Ai) since it was the minimal factorial language giving Ei when catenated with 
Ei—i, Assume by contrary that A" = Ln'(^'i) 7^ ^i- 

Let us consider a word u G A-\A". It does not belong to A", which means that 
su G A[ implies su G II'E* for all s G S* (in particular, u starts with a letter from 
II'). On the other hand, u G A[, which means that ut G Ai fl A(S\Ai) for some 
t G £*. By the definition, ut G A[, and the set of non-empty left extensions of ut 
to elements of Ai is a subset of that for u: 

{s G £ + |.su£ G Ai} C {s G S + |su G A-} C n'S*. 

Since we already know that Xu = u G II'E*, we see that ut ^ Ln'(Ai). So, 
Ai 7^ Lw {Ai ) , contradicting to the fact that the decomposition Ei = A\ ■ ■ ■ Ai 
was minimal. We have found a contradiction to the assumption that A[ ^ A" . 

So, A\ = A'( , and the decomposition obtained from A' = A[- ■ ■ A' k by deleting 
{A} entries is minimal, which was to be proved. □ 

To make the description complete, we state the symmetric lemma, for the case of 
A C II. Let B be a factorial language with B = B\- ■ ■ B m and II be a subalphabet; 
we start from LIi = LT and successively define for each j = 1, . . . ,m 

B'j - L n , {Bj) and II i+1 = n(B}), if B 3 £ n*, 

B'j — {A} and n j+ i = Uj, otherwise. 

The lemma symmetric to Lemma [8] is 

LemmaUt The canonical decomposition of B' = Ln(B) can be obtained by deleting 
{A} entries from the decomposition B' — B[ ■ B' 2 ■ ■ ■ B' m . 

The following easy example for Case (4) of Theorem [3] illustrates Lemma [H 

Example 5. The canonical decomposition of A = (a*b*) k + (b*a*) k is A = (a* + 
b*) 2k with A\ = ■ ■ ■ = A 2 k = {a* + b*) (here + denotes the unit). If we catenate 
it with B = a*, we get A' 2k — b* , A' 2k _ l = a*, and so on, and at last obtain 
A' = (a*b*) k and ~AB = (a* ■ b*) k ■ a*. 
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